\[~\]
\[~\]
We are provided with some preprocessed data coming from Autism Brain Image Data Exchange Project. This dataset is partitioned in two subsets: the Autism Spectrum Disorder (ASD) and the Typically Developed (TD) sets.
\[~\]
\[~\]
As shown in the image above, each subset contains the records of 12 patients. A sample of 145 records is associated to each person. Formally, this sample can be treated as an IID sample defined as:
\[ D^{(i)}_n = \{X_1, ..., X_n\} \sim f_X(\cdot) \hspace{1em} \text{ for a certain } f_X \text{, having } n=145 \]
Each \(X_i\) is defined as a random vector \(X_i = [X_i(1), ..., X_i(D)]^T \in \mathbb{R}^\mathtt{D}\), where \(\mathtt{D}=116\) is the number of the features, i.e the number of the Regions of Interest (ROIs) of the person’s brain. We also assume that each patient within a group can be himself treated as an independent sample from a certain \(f_D(\cdot)\). Thus, moving bottom-up in the data tree, each group formally corresponds to:
\[ D_{12} = \{D^{(1)}_n, D^{(2)}_n, ..., D^{(12)}_n\} \]
It is worth to note that the indipendence assumptions at the two levels of the data tree are quite crucial. Thanks to them, we have different ways of pooling the data:
From this point on, we follow the second approach, computing the correlation matrix \(\big(\hat{\rho}_{j,k}\big)^{(i)} \in [-1,1]^{\mathtt{D}\times \mathtt{D}}\) associated to each \(D^{(i)}_n\), \(i=1,...,12\), and taking the mean of them; by the IID property of \(\{D^{(1)}_n, ..., D^{(12)}_n\}\), we have that \(\Big\{\big(\hat{\rho}_{j,k}\big)^{(i)}\Big\}_{i=1}^{12}\) is still independent, being function of each \(D^{(i)}_n\). This choice has many advantages, which will be pointed out further in the discussion.
As pointed out in [1], the Pearson correlation coefficient is non-additive, then it is not mathematically meaningful to take the mean of them as they are. Hence, before averaging, we apply the Fisher Z-transform in order to smoothly map the interval \([-1,1]\) into \(\mathbb{R}\):
\[ \Big\{\hat{Z}^{(i)}_{j,k}\Big\}_{i=1}^{12} = \Big\{atan\left(\hat{\rho}^{(i)}_{j,k}\right)\Big\}_{i=1}^{12} \hspace{1em} \text{where} \hspace{.5em} j,k \in \{1,...,\mathtt{D}\},\hspace{.4em} j\neq k \]
Recall that, for each \(\hat{Z}^{(i)}_{j,k}\), holds:
\[ \hat{Z}^{(i)}_{j,k} \hspace{.4em} \dot{\sim} \hspace{.4em} N_1\Big(atan(\hat{\rho}^{(i)}_{j,k}), \frac{1}{n - 3}\Big), \hspace{1em} n=145 \]
Finally, we take:
\[ \overline{Z}_{12}(j,k) = \frac{1}{12}\sum_{i=1}^{12}\hat{Z}^{(i)}_{j,k} \text{ ,} \]
remembering that \(\hat{Z}^{(i)}_{j,k}\) are \(IID\), being function of \(\hat{\rho}^{(i)}_{j,k}\).
Here the convenience of this approach becomes evident; indeed, by the CLT, the sample mean is asymptotically normal so that:
\[ \frac{\overline{Z}_{12}(j,k) - Z_{j,k}}{\hat{\sigma}_{j,k}/\sqrt{12}} \hspace{.4em} \dot{\sim} \hspace{.4em} N(0,1), \] where \(\hat{\sigma}_{j,k} = \frac{1}{\sqrt{145 - 3}}\). In building the asymptotic confidence interval for \(Z_{j,k}\), these results reveal pretty relevant.
\[~\]
\[~\]
After loading the data, we define a function called get.cor.ROIs in order to return a dataframe of correlations between the all possible combinations between the ROIs. Inside this function we transform the correlation values with the Fisher Z-transform as defined in the previous section.
\[~\]
# define a function in order to catch each person and return the correlation matrix between ROIs
get.cor.ROIs <- function(...){
args <- list(...)
if(!is.null(args[["person"]])) {
args$frame <- args[["group"]][[args[["person"]]]]
}
n <- ncol(args[["frame"]])
nms <- names(args[["frame"]])
# takes efficiently the combinations, this is useful for large dataset!
V1 <- rep(nms[1:(n-1)], seq(from=n-1, to = 1, by = -1))
V2 <- unlist(lapply(1:(n-1), function(i)(nms[(i+1):n])))
corr.ROIs <- data.frame(ROI_1=V1, ROI_2=V2) # create the correlation in which I will insert the values
corr.ROIs$z.pearson.corr <- apply(corr.ROIs, 1, function(row) {
atanh(cor(args[["frame"]][row["ROI_1"]], args[["frame"]][row["ROI_2"]])) # takes the sets of columns; apply the Z-transform to the correlation matrix
})
return(corr.ROIs)
}
\[~\]
Recall, for these 12 people of each group, the function defined above. We put into the call function the type of group and the corresponding i-th person.
\[~\]
# create the matrix correlations for all patients
for(person in names(asd_sel))
assign(paste("z.trans.", person, sep=""), get.cor.ROIs(group = asd_sel, person = person))
for(person in names(td_sel))
assign(paste("z.trans.", person, sep=""), get.cor.ROIs(group = td_sel, person = person))
\[~\]
After computing the correlation dataframes, for all the patients within each group, we pool the data as hinted at the beginning.
\[~\]
# create a unique matrix for each group
unique.matrix <- function(group) {
mtx <- z.trans.caltech_0051472[c("ROI_1", "ROI_2")] # create matrix with combinations
mtx$cor.mean <- apply(mtx, 1, function(row) {
values <- c()
for(person in names(group)) {
frame <- get(paste("z.trans.", person, sep="")) # match the address of the string with the real variable
elem <- frame[(frame[["ROI_1"]] == row["ROI_1"]) & (frame[["ROI_2"]] == row["ROI_2"]), "z.pearson.corr"] # select the correlation
values <- c(values, elem)
}
mean(values) # take the mean!
})
return(mtx)
}
\[~\]
We finally get the average correlation matrix for the groups ASD and TD:
\[~\]
# call the creation of unique matrix
asd_sel.dt <- unique.matrix(asd_sel); head(asd_sel.dt)
## ROI_1 ROI_2 cor.mean
## 1 2001 2002 0.51188381
## 2 2001 2101 0.02486222
## 3 2001 2102 -0.21203874
## 4 2001 2111 -0.05246567
## 5 2001 2112 -0.19052649
## 6 2001 2201 0.14638295
td_sel.dt <- unique.matrix(td_sel); head(td_sel.dt)
## ROI_1 ROI_2 cor.mean
## 1 2001 2002 0.416005628
## 2 2001 2101 0.086652336
## 3 2001 2102 -0.334109327
## 4 2001 2111 0.007217533
## 5 2001 2112 -0.150770771
## 6 2001 2201 0.164410598
\[~\]
\[~\]
library(ggplot2)
library(hrbrthemes)
halfHeatmap <- function(x, title) {
# Give extreme colors:
xx <- x[order(x$cor.mean, decreasing = TRUE),][1:80,]
return(ggplot(xx, aes(ROI_1, ROI_2, fill = cor.mean)) +
ggtitle(title) +
geom_tile() +
guides(fill = guide_legend(title = "Z-transformed correlation", title.position = "top")) +
scale_fill_gradient(low="pink", high="darkorchid4") +
theme_minimal() +
theme(axis.line = element_line(color = "gray", size = 1),
panel.grid = element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
plot.title = element_text(hjust = 0.5)
))
}
halfHeatmap(asd_sel.dt, title = "Top 80 positively correlated ROIs in ASD group (Z-scale)")
\[~\]
halfHeatmap(td_sel.dt, title = "Top 80 positively correlated ROIs in TD group (Z-scale)")
\[~\]
hist(asd_sel.dt$cor.mean, probability = T, breaks = 50, col = "orange", main = "Histogram of ASD group", xlab = " mean", ylab = "count", border = "white")
\[~\]
hist(td_sel.dt$cor.mean, probability = T, breaks = 50, col = "steelblue", main = "Histogram of TD group", xlab = " mean", ylab = "count", border = "white")
\[~\]
hist(rbind(asd_sel.dt[["cor.mean"]], td_sel.dt[["cor.mean"]]), probability = T, breaks = 50, col = "purple", main = "Histogram of ASD+TD group", xlab = " mean", ylab = "count", border = "white")
\[~\]
\[~\]
Start to find the quantile value for these two groups. Since working on the Z-scale, the threshold is also in the Z-scale.
\[~\]
z.t <- apply(rbind(asd_sel.dt["cor.mean"], td_sel.dt["cor.mean"]), 2, quantile, probs=0.8)
paste("the threshold at the 80th percentile for the subjects is: ", round(as.numeric(z.t), 3), sep = "")
## [1] "the threshold at the 80th percentile for the subjects is: 0,125"
\[~\]
We compute the confidence interval for each \(\overline{Z}_{12}(j,k)\). Starting from:
\[ \frac{\overline{Z}_{12}(j,k) - Z_{j,k}}{\hat{\sigma}_{j,k} / \sqrt{12}}, \\ \text{ where } \hat{\sigma}_{j,k} = \frac{1}{\sqrt{145 - 3}} \]
and applying the Bonferroni’s correction:
\[ \frac{\alpha}{m}, \hspace{1em} \text{ where } \hspace{.5em} m = \begin{pmatrix} \mathtt{D} \\ 2 \end{pmatrix}, \hspace{.5em} \mathtt{D}=116 \]
\[~\]
we end up with:
\[~\]
\[ C_{12}^{Z(j,k)}\Big(\frac{\alpha}{m}\Big) = \bigg[\overline{Z}_{12}(j,k) \mp z_{\alpha/2m} \cdot \frac{\hat{\sigma}_{j,k}}{\sqrt{12}}\bigg] \]
\[~\]
conf.int <- function(dt, m = 1, alpha = 0.05) { # m : Family-Wise Error Rate correction (Bonferroni)
# Asymptotic variance
n.samp <- 145 # number of IID samples on which we computed the correlation matrix
sigma <- 1 / sqrt(n.samp - 3) # standard deviation of the (j,k)-th Z-transformed Pearson coefficent
n.p <- 12 # number of people in each of the two groups
se <- sigma / sqrt(n.p) # standard error of the estimator Z12
# Confidence interval
cint <- sapply(dt$cor.mean, function(z.i) {
list(confidence_interval = c(lb = z.i - se * qnorm(1 - alpha / (2*m)),
ub = z.i + se * qnorm(1 - alpha / (2*m))))
})
return(cint)
}
\[~\]
Finally:
\[~\]
## Compute the confidence interval
m <- (116 * 115) / 2 # number of intervals
asd_sel.dt$cint <- conf.int(asd_sel.dt, m = m); asd_sel.dt$cint[1:3]
## $confidence_interval
## lb ub
## 0,4033775 0,6203901
##
## $confidence_interval
## lb ub
## -0,08364405 0,13336849
##
## $confidence_interval
## lb ub
## -0,3205450 -0,1035325
td_sel.dt$cint <- conf.int(td_sel.dt, m = m); td_sel.dt$cint[1:3]
## $confidence_interval
## lb ub
## 0,3074994 0,5245119
##
## $confidence_interval
## lb ub
## -0,02185393 0,19515860
##
## $confidence_interval
## lb ub
## -0,4426156 -0,2256031
\[~\]
Show an example of confidence interval between two ROIs:
\[~\]
plotCint <- function(ROI.1, ROI.2, dt) {
hist(dt$cor.mean, probability = T, breaks = 50, col = "pink", main = "Interval of a data point", xlab = " mean", ylab = "count", border = "deeppink")
corresponding.interval <- function(ROI.1, ROI.2, group) {
cint <- group[(group$ROI_1 == ROI.1) & (group$ROI_2 == ROI.2), "cint"]
lb <- cint[[1]]["lb"]; ub <- cint[[1]]["ub"]
estimate.corr <- group[(group$ROI_1 == ROI.1) & (group$ROI_2 == ROI.2), "cor.mean"]
info <- list("lb" = lb, "ub" = ub, "cor.mean" = estimate.corr)
return(info)
}
cint <- corresponding.interval(ROI.1 = ROI.1, ROI.2 = ROI.2, dt)
rect(cint$lb, -10, cint$ub, +10, border = NA, col = viridis::viridis(1, .3))
points(cint$cor.mean, 0, col = "green", pch = 19)
abline(v = cint$cor.mean, lwd = 3, col = viridis::viridis(1, .1))
text(cint$cor.mean, 2, expression(hat(p)[mean]), pos = 4)
}
plotCint(ROI.1 = "2001", ROI.2 = "2002", asd_sel.dt)
\[~\]
plotCint(ROI.1 = "2001", ROI.2 = "2002", td_sel.dt)
\[~\]
We are now ready to estimate the adjacency matrix of the association graphs:
\[~\]
## Estimate the adjacency matrix given the Z-transformed Pearson correlation coefficient
get.est.adj.mat <- function(dt, dec.rule, t) {
# create the adj matrix
nameVals <- sort(unique(unlist(dt[1:2]))) # set up storage matrix, get names for row and columns
# construct 0 matrix of correct dimensions with row and column names
adj_mat <- matrix(0, length(nameVals), length(nameVals), dimnames = list(nameVals, nameVals))
# fill in the matrix with matrix indexing on row and column names
adj_mat[as.matrix(dt[c("ROI_1", "ROI_2")])] <- 0
# put edge according to the decision rule
# TODO: SUBSTITUTE THE FOR LOOP WITH A MORE EFFICIENT SOLUTION
for(idx in rownames(dt)){
if( dec.rule(dt, idx, t) ) {
adj_mat[as.character(dt[idx, "ROI_1"]), as.character(dt[idx, "ROI_2"])] <- 1
}
}
return(adj_mat)
}
## check if the two intervals int1 and int2 are overlapping
are.joint <- function(int1, int2) return((int1[1] <= int2[2]) && (int2[1] <= int1[2]))
## check the simple threshold condition
check.threshold <- function(dt, idx, t) {
val <- abs(dt[idx, "cor.mean"])
return(val >= t)
}
## check the confidence interval condition
check.conf.int <- function(dt, idx, t) {
c.int <- dt[idx, "cint"]$confidence_interval
t.int <- c(-t, t)
return(!are.joint(c.int, t.int))
}
\[~\]
The association graph taking into account the confidence interval with the Bonferroni’s correction is:
\[~\]
# Adjacency matrix of the estimated graph with the Bonferroni's correction
adj_mat_asd_bonf <- get.est.adj.mat(asd_sel.dt, check.conf.int, z.t)
adj_mat_td_bonf <- get.est.adj.mat(td_sel.dt, check.conf.int, z.t)
# Estimated graph
G.asd.bonf <- graph_from_adjacency_matrix(adj_mat_asd_bonf, mode = "undirected")
G.td.bonf <- graph_from_adjacency_matrix(adj_mat_td_bonf, mode = "undirected")
\[~\]
Plot the graphs with the threshold, with Bonferroni’s multiplicity adjustment:
\[~\]
# To plot the correlation graph(s)
plot(G.asd.bonf, vertex.size = 4, edge.width = .3, vertex.color = "orchid", label.col = "black",
main = "ASD Association Graph", sub = "(Bonferroni's multiplicity adjustment)",
layout = layout.kamada.kawai, vertex.label.cex = 0.7, vertex.label.dist = .8)
plot(G.td.bonf, vertex.size = 4, edge.width = .3, vertex.color = "dodgerblue", label.col = "black",
main = "TD Association Graph", sub = "(Bonferroni's multiplicity adjustment)",
layout = layout.kamada.kawai, vertex.label.cex = 0.7, vertex.label.dist = .8)
\[~\]
G.diff.bonf <- difference(G.asd.bonf, G.td.bonf)
plot(G.diff.bonf, vertex.size = 4, edge.width = .8, vertex.color = "orange", label.col = "black",
main = TeX("Co-activated ROIs in $(\\mathrm{G}^{ASD} - \\mathrm{G}^{TD})$"), sub = "(Bonferroni's multiplicity adjustment)",
layout = layout.kamada.kawai, vertex.label.cex = 0.7, vertex.label.dist = .8)
\[~\]
The association graph taking into only the threshold is:
\[~\]
# Adjacency matrix of the estimated graph with the Bonferroni's correction
adj_mat_asd_thr <- get.est.adj.mat(asd_sel.dt, check.threshold, z.t)
adj_mat_td_thr <- get.est.adj.mat(td_sel.dt, check.threshold, z.t)
# Estimated graph
G.asd.thr <- graph_from_adjacency_matrix(adj_mat_asd_thr, mode = "undirected")
G.td.thr <- graph_from_adjacency_matrix(adj_mat_td_thr, mode = "undirected")
\[~\]
Plot the graphs with the threshold, without adjustment:
\[~\]
plot(G.asd.thr, vertex.size = 4, edge.width = .3, vertex.color = "orchid", label.col = "black",
main = "ASD Association Graph", sub = "(Threshold only)",
layout = layout.kamada.kawai, vertex.label.cex = 0.7, vertex.label.dist = .8)
plot(G.td.thr, vertex.size = 4, edge.width = .3, vertex.color = "dodgerblue", label.col = "black",
main = "TD Association Graph", sub = "(Threshold only)",
layout = layout.kamada.kawai, vertex.label.cex = 0.7, vertex.label.dist = .8)